Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

ReBNN: Resilient Binary Neural Network

E:HLJKWGLVWULEXWLRQRI5H$FW1HW

F,OOXVWUDWLRQRIZHLJKWRVFLOODWLRQ

D:HLJKWRVFLOODWLRQRI5H$FW1HW

FIGURE 3.28

(a) We show the epoch-wise weight oscillation of ReActNet. (b) We randomly select two

channels of the ﬁrst 1-bit layer in ReActNet [158]. The distribution is with three peaks

centering around {−1, 0, +1}, which magniﬁes the non-parametric scaling factor (red line).

w and L indicate the latent weight and network loss function (blue line), respectively.

a result, we apply this set of hyperparameters to the remaining experiments in this chapter.

Note that the recurrent model does not aﬀect when τ is set to 1.

3.9

ReBNN: Resilient Binary Neural Network

Conventional BNNs [199, 158] are often sub-optimized due to their intrinsic frequent weight

oscillation during training. We ﬁrst identify that the weight oscillation mainly originates

from the non-parametric scaling factor. Figure 3.28(a) shows the epoch-wise oscillation⁴

of ReActNet, where the weight oscillation exists even when the network is convergent.

As shown in Fig. 3.28(b), the conventional ReActNet [158] possesses a channel-wise tri-

modal distribution in the 1-bit convolution layers, whose peaks, respectively, center around

{−1, 0, +1}. This distribution leads to a magniﬁed scaling factor α, and thus the quantized

weights ±α are much larger than the small weights around 0, which might cause the weight

oscillation. As illustrated in Fig. 3.28(c), In BNNs, the real-valued latent tensor is binarized

by the sign function and scaled by the scaling factor (the orange dot) in forward propagation.

In backward propagation, the gradient is calculated based on the quantized value ±α (indi-

cated by the yellow dotted line). However, the gradient of small latent weights is misleading

when weights around ±1 magnify the scaling factor, such as ReActNet (Fig. 3.28(a)). Then

the update is conducted on the latent value (the black dot), leading to the latent weight

oscillation. With minimal representation states, such latent weights with small magnitudes

frequently oscillate during non-convex optimization.

We aim to introduce a Resilient Binary Neural Network (ReBNN) [258] to address the

problem above. The intuition of our work is to relearn the channel-wise scaling factor and the

latent weights in a uniﬁed framework. Consequently, we propose parameterizing the scaling

factor and introducing a weighted reconstruction loss to build an adaptive training objective.

4A toy example of weight oscillation: From iteration t to t+1, a misleading weight update occurs causing

an oscillation from −1 to 1, and from iteration t to t+2 causes an oscillation from 1 to −1.